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[57] ABSTRACT 

A hardware and software mechanism is provided for 
ensuring that a feature processor card, included with 
other feature cards in a host system, can be reset with- 
out interrupting software running on other feature 
cards. A delay is provided that starts counting each time ; 
a watchdog timer expires. If the watchdog timer is reset 
by an interrupt service routine, then the feature card/ 
processor is assumed to be reset. But, if the watchdog 
timer is not reset before the delay timer expires, then it; 
is assumed that service routine is corrupt and that exter- 
nal reset of the feature card is required. Upon expiration 
of the watchdog, an error signal is sentj via the system / 
bus, to the host CPU. Recovery code that is resident on 
the host CPU is then run and resets the CPU on the 
feature card. A reset signal is output from the host CPU, 
via the system bus, to a reset register on the feature card 
which then forwards the signal to the feature card CPU, 
thereby initiating reset of the system. 

14 Claims, 5 Drawing Sheets 
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SYSTEM CRASH DETECT AND AUTOMATIC 
RESET MECHANISM FOR PROCESSOR CARDS 

BACKGROUND OF THE INVENTION 5 
!. Field of the Invention 

This invention generally relates to the individual 
reset of one of a plurality of processor cards installed on 
a host system, or a server. More particularly, an individ- 
ual workstation consisting of a processor card that is 
physically located in a host machine with a plurality of 
other processor cards, each using a particular software 
operating system, can be reset without adversely affect- 
ing the operations of other workstations or the host 
system. 13 

2. Description of Related Art 

Generally, IBM and IBM compatible computers, 
such as the PS/2 (PS/2 is a trademark of IBM Corpora- 
tion) use a CTRL-ALT-DEL (Control-Alternate- 
Delete) keyboard sequence to reset the system, typi- 2C 
cally after the software crashes, for reasons such as 
defective application program software, or the like. 
However, on some occasions a software crash will 
occur in such a way that the CTRL-ALT-DEL se- 
quence cannot be serviced by a reset routine and the 25 
only method of resetting the system is to turn the ma- 
chine off and then on again. 

It is known that multiple feature cards, each having 
their own processor, can be installed in a computer that 
is configured as a host machine. Each of the feature 30 
cards will independently run an operating system on its 
processor and support a separate user. Thus, it can be 
seen that if the feature card crashes and the standard 
CTRL-ALT-DEL sequence fails to reset the system, 
the user needs another means of resetting the feature 35 
card processor. The user cannot utilize another terminal 
in the system to reset the crashed card, since all other 
terminals may be in use. Further, the entire host ma- 
chine cannot be powered off and then on again, since all 
of the programs being run by other users on the system 40 
would be reset. A special reset switch could be hard- 
wired from each terminal to the corresponding feature 
card, but this would make the terminal equipment, i.e. 
keyboard, display, and the like non-standard and thus 
incompatible with equipment from other manufactur- 45 
ers. Similarly, a special switch installed on the host 
machine, corresponding to each feature card, is unac- 
ceptable since users are likely to be in a different room 
than the host machine, and the switches may be in an 
inconveniently accessed physical location on the host 50 
box. 

Therefore, it can be seen that a mechanism for reli- 
ably resetting, after a software crash, an individual fea- 
ture card corresponding to a particular terminal, and 
which is contained in a host machine having multiple 55 
feature cards is very desirable. In particular, a mecha- 
nism is needed that will reset only the feature card that 
has crashed, without resetting any of the other cards 
contained in the host all of which may be running pro- 
gram applications for other system users. 60 

SUMMARY OF THE INVENTION 

In contrast to the prior art, the present invention 
provides an external reset mechanism to restore the 
feature card after a software crash. As stated above, it is 65 
possible that the CTRL-ALT-DEL keyboard sequence 
will not reset the system on the feature card. Therefore, 
a timer, buffer and reset register are provided on the 
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feature card and utilized along with appropriate soft- 
ware on the host machine and the feature card to ensure 
the reset of each individual feature card. 

Broadly, each feature card (which is essentially a self 
contained personal computer) includes a watchdog 
timer that counts when the system timer goes unser- 
viced. The system timer is used by the operating system 
for various functions, such as timekeeping, task switch- 
ing, and the like. These functions are performed when 
the system timer expires and sends an interrupt to its 
associated CPU. However, if this interrupt goes unser- 
viced, then the watchdog timer expires. Since, the sys- 
tem timer is the highest priority interrupt to the local 
CPU on the feature card, it is assumed that the system 
may have crashed if the system timer goes unserviced. 
Normally, when the watchdog timer expires an inter- 
rupt, is output to drive a non-maskable interrupt (NMI) 
reset service routine on the local (feature card) CPU. 
This NMI routine can check the keyboard for a CTRL- 
ALT-DEL sequence. If the CTRL-ALT-DEL is not 
detected it is assumed that the NMI routine was falsely 
invoked (since a user will normally input the CTRL- 
ALT-Delete sequence if a system crash has actually 
occurred) and the watchdog timer is reset and the NMI 
routine ends. If CTRL-ALT-DEL is detected then a 
soft reset is initiated by invoking the system initializa- 
tion routine, such as the power on self test (POST). 

However, if the feature card has in fact crashed it is 
possible that the NMI service routine is corrupted and 
an external reset mechanism must be available, since 
powering the host machine off, and then on, is not possi- 
ble due to the presence of multiple users. 

This external reset mechanism includes a delay timer 
that starts counting each time the watchdog timer ex- 
pires. When the watchdog timer expires it may be reset 
by the NMI service routine, in which case the feature 
card processor is assumed to have been running prop- 
erly. But, if the watchdog timer is not reset before the 
delay period expires, then the NMI service routine may 
be corrupt and external reset of the feature card is re- 
quired. Upon expiration of the watchdog, an error sig- 
nal is sent via the system bus, to the host CPU. Recov- 
ery code that is resident on the host CPU is then run 
which resets the CPU on the feature card. A reset signal 
is output from the host CPU, via the system bus, to a 
reset register on the feature card which then forwards 
the reset signal to the feature card CPU, thereby initiat- 
ing reset of the system. 

In accordance with the previous summary, objects, 
features and advantages of the present invention will 
become apparent to one skilled in the art from the subse- 
quent description and the appended claims taken in 
conjunction with the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram showing the host machine 
having plural feature cards therein and associated user 
terminals; 

FIG. -2 is a schematic diagram of the components of a 
standard feature card; 

FIG. 3 is a flow chart of the steps representing the 
software function required to reset the feature card 
CPU of FIG. 2; 

FIG. 4 is a schematic diagram showing the present 
invention including its components and interconnection 
with the host CPU; and 
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FIG. 5 is a flow chart showing the functionality 29 is provided which is the highest priority interrupt to 

added to the multiprocessor system by the present in- the CPU 21. The system timer 29 is used by the operat- 

vention. ing system for various functions, such as timekeeping, 

„ task switching, and the like. These functions are per- 

DETAILED DESCRIPTION OFTHE 5 fonned whcn ^ tem ^ ire$ and 

PREFERRED EMBODIMENTS mterrupt to ils CPU 21. Additionally, a 
Referring to FIG. 1, a multiple user system is shown watchdog timer 27 is provided which counts when the 
and noted by reference numeral 1. A host computer 3 is system timer 29 does not get serviced within one of its 
shown which may be any of a number of commercially clock periods. Thus, when system timer 29 expires, a 
available computers such as a PS/2 system, or the like. 10 signal is input to watchdog timer 27. Watchdog timer 27 
Host computer 3 includes central processing unit 5 monitors this input from the system timer for an indica- 
which may be one of the Intel X86 family of micro- tion that the system timer has been cleared. If the sys- 
processors. Feature cards 7 are also included Withill tem timer is not cleared (goes unserviced), then watch- 
host 3 and will be described further with reference to dog timer 27 expires and it is assumed, that the system 
FIGS. 2 and 4. It should be noted that feature cards 7 15 has crashed and watchdog timer 27 outputs an interrupt 
are insertable and electrically connectable into host (error) signal to the NMI signal input 37 of CPU 21 
computer 3 via expansion slots, or the like. Addition- which then invokes a non-maskable interrupt (NMI) 
ally, it should be understood that while three feature reset routine (program 35) that can reset the system, 
cards 7 are shown for exemplary purposes, any number Additionally, ROM 23 includes a program 31 (A) 
of feature cards can be used and are contemplated by 20 which is an initialization program. RAM 25 includes 
the scope of the present invention. A system card 9 is program 33 (B) which is an application program, or an 
provided and includes a bus interface that may be the operating system that runs under normal conditions. 
MicroChannel bus, utilized by the IBM Corporation RAM 25 also includes program 35 (C) which is the 
(MicroChannel is a trademark of the IBM Corp.). Pe- NMI interrupt routine that is run when the NMI 37 
ripheral devices are also included within the host sys- 25 signal is triggered by watchdog timer 27. Memory 25, 
tem 3 and connected to system card 9 and may include NMI program 35 and initialization program 31 may be 
a memory device 11 such as a disk storage device, e.g. considered as part of an internal reset system of feature 
a floppy type disk or a hard drive. Additionally, a card 7. 

printer 13 is shown and may be provided as pan of the FIG. 3 is a flowchart showing the normal reset steps 
host system. Further, it should be noted that other in- 30 utilized by feature card 7 or system card 9, shown in 
put/output (I/O) peripheral devices, such as a keyboard FIGS. 1 and 2. At step 1 the system is initialized by 
and display (not shown) can be utilized and intercon- powering the machine on, to invoke initialization pro- 
nected with the host system 3 if desired. Also shown in gram 31 which may be a power on self test (POST) 
FIG. 1, are plural workstations 15 which may include a routine. At step 2, program 31 loads programs 33 and 35 
display 17, keyboard 19 and other peripheral devices, 35 into system memory and runs program 33 which is the 
such as a mouse, floppy disk drive, serial port, or the normal implementation of the operating system or a 
like. Again, three workstations 15 are shown, but it is to program application. Under normal operation, program 
be understood that any number of plural workstations 33 will then continue running until the user logs off or 
15 are contemplated by the present invention. It also the machine is powered off. However, at step 3, the 
can be seen from viewing FIG. 1, that each workstation 40 watchdog timer 27 expires due to a problem in program 
15 is associajed with a corresponding one of feature 33, or the like and a signal is output to CPU 21 trigger- 
cards 7 located in host system 3. Generally, feature card ing NMI interrupt routine 35. At step 4, the non-maska- 
7 will include a microprocessor, e.g. one of the Intel ble interrupt routine 35 is run by CPU 21. Program 35 
X86 microprocessors such that workstations 15 in con- (NMI interrupt routine) then determines whether the 
junction with a corresponding feature card 7 are essen* 45 user has input the keyboard sequence Control-Alt- 
tially equivalent to a stand alone personal computer Delete and if so the flowchart returns to step 1 and the 
system. It can be seen that in an office or business envi- system is reinitialized by running program 31. It is as- 
ronment, having a single host computer with plural sumed that if a problem has occurred in program 33 the 
feature cards and peripheral devices, associated with * user will in fact input the keyboard sequence Control- 
the feature cards, will utilize economies of scale and 50 Alt-Delete. Therefore, if it is determined at step 5 that 
other efficiencies to provide advantages for customers Control-Alt-Delete has not been input by the user, then 
desiring the use of multiple personal computer systems. step 6 considers the error to be false and the flowchart 
Of course, the cost of a single host computer and plural returns to step 2 where program 33 is continued. How- 
peripheral devices, i.e. displays and keyboards will be ever, in the case of a system crash, it is possible for the 
less than a corresponding number of individual personal 55 NMI reset routine to be corrupted such that program 35 
computer systems. Therefore, it can be seen why a b not invoked and the presence, or absence, of the Con- 
system such a shown in FIG. 1 is extremely desirable, trol- Alt-Delete keyboard sequence cannot be deter- 
and it will be further explained below why the present mined. In this case, the NMI routine is corrupt and no 
invention is needed to provide reset functions for each programs can be properly executed. The conventional 
feature card 7 independently in order to allow the most 60 solution is to power the machine off and on. However, 
efficient use of host system 3. as previously noted, this alternative is not possible when 
FIG. 2 shows a standard feature card which is config- utilizing a host system of plural feature cards such as is 
ured as an IBM PS/2 system without the benefit of the shown by FIG. 1. 

present invention. It can be seen that a central process- Referring to FIG. 4, hardware components and inter- 

ing unit 21 is provided, again such as the Intel X86 series 65 connections therebetween of feature card 7 of the pres- 

of microprocessors. A read only memory (ROM) 23 ent invention and host system 3 are shown. It should be 

and random access memory (RAM) 25 are also pro- noted that like reference numerals refer to identical 

vided and interconnected with CPU 21. A system timer components, as previously described with reference to 
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FIGS. 1 and 2. It can be seen that feature card 7 com-* 
municates with host 3 through bus 9. A buffer 51 and 
register 53 have been added to feature card 7 along with 
wiring changes required to implement the present in* 
vention. Further, a reset input 52 is shown within CPU 5 
21 and will be described below in conjunction with the 
operation of the present invention. It can be seen from 
FIG. 4, that upon expiration of watchdog timer 27 an 
error signal is output not only to the NMI input 37 of 
CPU 21 but also to buffer 51 and register 53. The 10 
watchdog timer signal is then output as an interrupt 
request signal (IRQ) through bus 9 and input as an inter- 
nipt to host CPU 5. RAM 25 of feature card 7 is also 
connected with host CPU 5 via bus 9. CPU 5 may also 
output signals to register 53 of feature cards 7, which 15 
are ultimately input to reset input 52 of feature card 
CPU 21. A host random access memory 59 is also pro- 
vided and shown connected with CPU 5. RAM 59 
includes a program 57 (£) which is an application pro- 
gram or operating system that runs under normal condi- 20 
tions. Program 55 (D) is an interrupt service routine 
which is executed when an interrupt is received by host 
CPU 5 through buffer 51. 

The operation of the present invention will now be 
described with reference to FIG. 5, as well as FIG. 4. 25 
At step 1, of FIG. 5, program 55 is invoked which 
assumes that watchdog timer 27 has expired and error 
signals have been input to CPU 21, buffer 51 and regis- 
ter 53. Buffer 51 outputs an interrupt request to host 
CPU 5 which then interrupts the normal operation of 30 
program 57 and begins the service routine program 55. 
At step 2, a delay mechanism is implemented such that 
program 55 waits for a period of time to ensure that the 
NMI driver program 35 is given ample opportunity to 
reset CPU 21 f as previously discussed with regard to 35 
FIGS. 2 and 3. In a preferred embodiment, this delay 
mechanism is implemented as a software timer, built 
into program 55. After an appropriate length of time, 
which is approximately 1.0 second, program 55 checks 
to see if the watchdog timer 27 is still active. This step 40 
is accomplished by host CPU 5 reading register 53 to 
check for the presence of the signal from the watchdog 
timer 27 (FIG. 4). If the watchdog timer is not active, 
i.e. a signal is not present in register 53, then the present 
invention assumes program 35 was successful in reset- 45 
ting CPU 21 or that program 35 determined the error to 
be false and the process returns to the normal operation 
of program 57. However, if it is determined at step 3 
that the watchdog timer 27 has remained active, then 
the process continues to step 4 wherein an integrity 50 
check is performed on the internal reset system, i.e. the 
validity of the memory and NMI routine is determined, 
of the feature card 7. In particular, CPU 5 checks for 
the integrity of RAM 25 and in particular whether 
program 35 has been corrupted (invalid). If the validity 55 
check shows that RAM 25 and program 35 are intact, or 
valid, then the process of FIG. 5 returns to step 2 
wherein the time delay is again implemented prior to 
checking register 53 for the presence of the reset signal 
that will indicate whether the watchdog timer is active. 60 
It will be understood by those skilled in the art that this 
integrity check can be performed using any one of a 
number of known techniques. For example, bit parity 
checking the code of the NMI routine 35 (i.e. check- 
sum) can be utilized. Additionally, maintaining a true 65 
copy of the NMI software code in host CPU 5 and 
performing a comparison with the NMI code in feature 
card 7 will provide an integrity check. It should be 



6 

noted that the process of the present invention will 
return to step 2 (from step 4) for various reasons, such as 
the delay period being not long enough for the watch- 
dog being reset. If at step 4, it is determined that RAM 
25 and program 35 are not intact, i.e. they are cor* 
rupted. then at step 5 program 55 resets CPU 21 of 
feature card 7. This is accomplished by CPU 5 output- 
ting a reset signal to register 53 which in turn outputs 
the reset signal to CPU 21 at reset input point 52. CPU 
21 then begins normal initialization operations, i.e. pro- 
gram 31 is run. Subsequent to reset of the feature card, 
the process of FIG. 5 then continues to step 6 where 
control of the host CPU is returned to program 57 and 
normal operations continue. 

Thus, it can be seen that the present invention allows 
single feature cards running in a host system, with a 
plurality of other feature cards, to be individually reset 
even when the non-maskable interrupt routine 35 and 
RAM 25 are corrupt. Conventionally, when the NMI 
routine 35 and the RAM 25 are corrupt, powering the 
machine off and then on is used as the reset tool. How- 
ever, with the present invention the CPU 21 of the 
feature card can be reset even during a normally unre- 
coverable error, thereby ensuring the integrity of pro- 
grams running on the plurality of other feature cards 7 
included within host system 3. 

It should also be noted that the complexity of the 
reset mechanism of the present invention is due to the 
large variety of software, that will run on personal 
computers (e.g. feature cards 7), which can cause the 
watchdog timer to expire under normal operations. If it 
is desired to minimize the sequence of checks that occur 
before resetting the feature card CPU 21, the type of 
software supported by the present invention can be 
limited. 

Although certain preferred embodiments have been 
shown and described, it should be understood that 
many changes and modifications may be made therein 
without departing from the scope of the appended 
claims. 

What is claimed is: 

1. A method of resetting one of a plurality of feature 
cards, each having a central processing unit, that are 
included within a host machine, said method compris- 
ing: 

running an external reset routine when an error signal 
is output from one of said feature cards; 

determining the validity of an internal reset system 
within said one of said feature cards; and 

externally resetting said feature card by outputting a 
reset signal from said host machine when said inter- 
nal reset system is determined to be invalid. 

2. A method according to claim 1 wherein said step of 
running comprises monitoring, by a processor within 
said host machine, said feature card for the presence of 
said error signal. 

3. A method according to claim 2 wherein said step of 
running further comprises: 

waiting for a predetermined period of time to ensure 
that said internal reset system has an opportunity to 
reset said feature card; and 

checking for the presence of said error signal, after 
said predetermined period of time has ended. 

4. A method according to claim 3 wherein said step of 
determining further comprises performing a validity 
check on said internal reset system, when said error 
signal is present. 
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5. A method according to claim 4 wherein said inter- a central processing unit within said host machine 
nal reset system comprises a memory, reset program capable of reading said receiving means for the 

....... presence of said error signal. 

and initialization program. 10 A system according to claim 9 wherein said means 

6. A method according to claim 5 wherein said step of e J„*JZ„ f "r7h„ 

. . * „. .„ 5 for running further comprises: 

determining comprises returning said feature card to dday m ^ n$ foj . waiting a predetermined pe riod of 

normal operation, subsequent to said step of checking, timfi tQ provide ^ internal reset system an oppor- 

when said error signal is absent. tunity to reset said feature card; and 

7. A method according to claim 6 wherein said step of means for reading said receiving means for the pres- 
extemally resetting comprises outputting said external 10 ence of said error signal, after said predetermined 
reset signal from said host processor which causes said period of time has ended. 

feature card to run said initialization program and reset U. A system according to claim .10 wherein said # 

said feature card central processing unit. means for detcrrnining performs a validity check when 

8 A system for resetting one of a plurality of feature said error signal is present in said register, 

cards, each having a central processing unit, that are 15 12 A system according to claim 11 wherein said 

included within a host machine, comprising: Eternal reset system comprises a memory, a reset pro- 

oieans for running an external reset routine when an ^\%Cf^Z^^ U wherein said 

error signal is output from one of said feature cards; m ^ f J determining further comprises . means for 

means for determining the validity of an internal reset 2(J retuming ^ feature card t0 norma i operation, subse- 

system within said one of said feature cards; and qucnt tQ rcading ^ receiving means, when said error 

means for externally resetting said feature card when js absent 

said internal reset system is determined to be in- ^ A S y Stem according to claim 13 wherein said 

valid. external reset means comprises means for outputting a 
9. A system according to claim 8 wherein said means 25 reset signal from said host processor which causes said 

for running comprises: feature card to run said initialization program and reset 

means, within said feature card, for receiving said said feature card central processing unit. 

. . * * • ♦ * 
error signal; and 
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